5a: The Power of Diffusion Models!

An exploration of the capabilities and applications of diffusion models

Part 0: Setup

In this part, I gained access to DeepFloyd, obtained some pre-embedded text prompts, and downloaded the precomputed text embeddings. For the three text prompts provided, I displayed the caption and the output of the model with num_inference_steps values equal to 10, 20, and 40. Here are the results:

From the results, we can see that with 10 steps the image is noisy, and there are not any obvious differences between 20 and 40. The random seed used is 180.

Part 1: Sampling Loops

1.1 Forward Process

First, I implemented the forward process, which is the forward(im, t) function (adds random noise to the image at different levels). The test image at noise levels 250, 500, and 750 are shown below:

Forward Process at Different Noise Levels

1.2 Gaussian Blur Filtering

Then, I tried using Gaussian blur filtering to remove the noise. The result is shown below:

1.3 One-Step Denoising

I asked the model to implement one-step denoising on the test image with noise levels at 250, 500, and 750. The results are shown below:

1.4 Multi-Step Denoising

I implemented denoising with multiple steps. At each step, the model estimates the noise and the original image, allowing us to iteratively obtain a more accurate denoised image. The result is shown below:

1.5 Image Generation from Scratch

I used the model to generate images from scratch. Here are five images generated:

1.6 Classifier-Free Guidance (CFG)

To improve the images generated by the model, I applied Classifier-Free Guidance (CFG). Instead of using the noise predicted by the model directly at each step, I adjusted it by adding the difference from the unconditional prompt noise. The results are shown below:

Interestingly, the results of CFG are mostly human faces, possibly because the training data predominantly features human faces.

1.7 Denoising with Different Noise Levels

I took the test image, added noise to it, and then denoised it. I experimented with different extents of noise, setting i_start to 1, 3, 5, 7, 10, and 20. The results are shown below:

I also applied this method to my own images. The results are shown below:

1.7.1 Additional Images

Besides the test image and my own images, I applied the method to a web image and two hand-drawn images. The results are shown below:

1.7.2 Inpainting

I used the same procedure to implement inpainting, following the RePaint paper. Given an image and a binary mask m, we can create a new image that retains the original content where m is 0, and generates new content where m is 1. The results are shown below, including the test image and two of my own images:

1.7.3 SDEdit with Text Prompt

I applied SDEdit, guiding the projection with a text prompt. The test image was prompted with "a photo of a rocket". The result is shown below:

Applied to my own images:

1.8 Creating Optical Illusions with Diffusion Models

I created optical illusions using diffusion models. In this part, I generated an image that looks like "an oil painting of people around a campfire", but when flipped upside down reveals "an oil painting of an old man". The result is shown below:

Other results are shown below:

Optical Illusion - a man wearing a hat and snow town

1.9 Factorized Diffusion and Hybrid Images

In this part, I implemented Factorized Diffusion and created hybrid images, similar to those in Project 2. The results are shown below: